Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes

نویسنده

  • Satinder P. Singh
چکیده

Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in robotics and artificial intelligence. R L researchers have focussed almost exclusively on problems where the controller has to maximize the discounted sum of payoffs. However, as emphasized by Schwartz (1$X)3), in many problems, e.g., those for which the optimal behavior is a limit cycle, it is more natural and computationally adva.ntageous to formulatAe tasks so that the controller’s objective is to ma.ximize the avera.ge payoff received per time step. In this paper I derive new average-payofl RL algorithms as stochastic approximation methods for solving the system of equations associated with the policy evctl~~tiot~ and optimal control questions in avera.ge-payoff RL tasks. These algorithms are analogous to the popular TD and Q-learning a.lgorithms a.lready developed for the discounted-payoff case. One of the a.lgorit.hms clerived here is a significant variation of Schwartz’s R-lea.rning algorithni. Prelimina.ry empirica results arc presented to validate these new algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-Deterministic Policies In Markovian Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...

متن کامل

Human learning in non-Markovian decision making

Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the c...

متن کامل

Learning Without State-Estimation in Partially Observable Markovian Decision Processes

Reinforcement learning RL algorithms pro vide a sound theoretical basis for building learning control architectures for embedded agents Unfortunately all of the theory and much of the practice see Barto et al for an exception of RL is limited to Marko vian decision processes MDPs Many real world decision tasks however are inherently non Markovian i e the state of the environ ment is only incomp...

متن کامل

A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the avera...

متن کامل

Non-Deterministic Policies in Markovian Decision Processes

Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision-making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Although conventional meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994